Exploring performance of Xeon Phi co-processor

نویسنده

  • Mateusz Iwo Dubaniowski
چکیده

The project aims to explore the performance of Intel Xeon Phi processor. We use various parallelisation and vectorisation methods to port a LU decomposition library to the coprocessor. The popularity of accelerators and co-processors is growing due to their good energy efficiency characteristics, and the large potential of further performance improvements. These two factors make co-processors suitable to drive the innovation in high performance computing forwards, towards the next goal of achieving the Exascalelevel computing. Due to increasing demand Intel has delivered a co-processor designed to fit the requirements of the HPC community, the Intel MIC architecture, of which the most prominent example is Intel Xeon Phi. The co-processor utilises the many-core principle. It provides a large number of slower cores supplemented with vector processing units, thus forcing high level of parallelisation upon the users. LU factorisation is an operation on matrices used in many fields to solve linear algebra, inverse matrices, and calculate matrix determinants. In this project we port a LU factorisation algorithm using Gaussian elimination method to perform the decomposition to Intel Xeon Phi co-processor. We use various parallelisation techniques including Intel LEO, OpenMP 4.0 pragmas, Intel’s Cilk array notation, and ivdep pragma. Furthermore, we examine the effect of data transfer to the co-processor on the overall execution time. The results obtained show that the best level of performance on Xeon Phi is achieved with the use of Intel Cilk array notation to vectorise, and OpenMP4.0 to parallelise the code. Intel Cilk array notation, on average across sparse and dense benchmark matrices, results in the speed-up of 27 times over the single-threaded performance of the host processor. The peak speed-up achieved with this method, across attempted benchmarks, results in performance 49 times better than that of a single thread of the host processor.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Main Memory Hash Joins on Intel Xeon Phi Processors: An Experimental Approach

Modern processor technologies have driven new designs and implementations in main-memory hash joins. Recently, Intel Many Integrated Core (MIC) co-processors (commonly known as Xeon Phi) embrace emerging x86 single-chip many-core techniques. Compared with contemporary multi-core CPUs, Xeon Phi has quite di↵erent architectural features: wider SIMD instructions, many cores and hardware contexts, ...

متن کامل

Co-design of a Particle-in-Cell Plasma Simulation Code for Intel Xeon Phi: A First Look at Knights Landing

Three dimensional particle-in-cell laser-plasma simulation is an important area of computational physics. Solving state-of-the-art problems requires large-scale simulation on a supercomputer using specialized codes. A growing demand in computational resources inspires research in improving efficiency and co-design for supercomputers based on manycore architectures. This paper presents first per...

متن کامل

Speeding up lattice sieve with Xeon Phi coprocessor

Major substep in a lattice sieve algorithm which solves the Euclidean shortest vector problem (SVP) is the computation of sums and Euclidean norms of many vector pairs. Finding a solution to the SVP is the foundation of an attack against many lattice based crypto systems. We optimize the main subfunction of a sieve for the regular main processor and for the co-processor to speed up the algorith...

متن کامل

Towards Modeling Energy Consumption of Xeon Phi

In the push for exascale computing, energy efficiency is of utmost concern. System architectures often adopt accelerators to hasten application execution at the cost of power. The Intel Xeon Phi co-processor is unique accelerator that offers application designers high degrees of parallelism, energy-efficient cores, and various execution modes. To explore the vast number of available configurati...

متن کامل

Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi

Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for costefficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015